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We consider the problem of partitioning the nodes of a graph with costs 
on its edges into subsets of given sizes so as to minimize the sum of the costs 
on all edges cut. This problem arises in several physical situations — for 
example, in assigning the components of electronic circuits to circuit 
boards to minimize the number of connections between boards. 

This paper presents a heuristic method for partitioning arbitrary graphs 
which is both effective in finding optimal partitions, and fast enough to be 
practical in solving large problems. 

I. INTRODUCTION 

1.1 Definition of the Problem 

This paper deals with the following combinatorial problem: given a 
graph G with costs on its edges, partition the nodes of G into subsets no 
larger than a given maximum size, so as to minimize the total cost of 
the edges cut. 

One important practical example of this problem is placing the com- 
ponents of an electronic circuit onto printed circuit cards or substrates, 
so as to minimize the number of connections between cards. The com- 
ponents are the nodes of the graph, and the circuit connections are the 
edges. There is some maximum number of components which may be 
placed on any card. Since connections between cards have high cost 
compared to connections within a board, the object is to minimize the 
number of interconnections between cards. 

This partitioning problem also arises naturally in an attempt to 
improve the paging properties of programs for use in computers with 
paged memory organization. A program (at least statically) can be 
thought of as a set of connected entities. The entities might be sub- 
routines, or procedure blocks, or single instruction and data items, 
depending on viewpoint and the level of detail required. The connections 
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between the entities might represent possible flow or transfer of control, 
or references from one entity to another. The problem is to assign the 
objects to "pages" of a given size so as to minimize the number of 
references between objects which lie on different pages. 

To pose the partitioning problem mathematically, we shall need the 
following definitions. Let G be a graph of n nodes, of sizes (weights) 
w % > 0, * = 1, • • • , n. Let p be a positive number, such that < w, ^ p 
for all i. Let C = (c,,), i, j = 1, • • • , n be a weighted connectivity matrix 
describing the edges of G. 

Let k be a positive integer. A k-way partition of G is a set of nonempty, 
pairwise disjoint subsets oi G, v t , • ■ ■ , v k such that W* =1 i>,- = G. A 
partition is admissible if 

| v t | ^ p for all i, 

where the symbol | x | stands for the size of a set x, and equals the sum 
of the sizes of all the elements of x. The cost of a partition is the summa- 
tion of dj over all i and j such that i and ; are in different subsets. The 
cost is thus the sum of all external costs in the partition. 

The partitioning problem we consider here is to find a minimal-cost 
admissible partition of G. 

There are three other problems which are equivalent to this one. 
First, minimizing external cost is equivalent to maximizing internal 
cost because the total cost of all edges is constant. Further, by changing 
the signs of all c,/s, we can maximize external cost, or minimize internal 
cost. 

1.2 Exact Solutions 

A strictly exhaustive procedure for finding the minimal cost partition 
is often out of the question. To see this suppose that G has n nodes of 
size 1 to be partitioned into k subsets of size p, where kp = n. Then there 
are (l) ways of choosing the first subset, (""") ways for the second, 
and so on. Since the ordering of the subsets is immaterial, the number 
of cases is 



m 



n ~ p ) ... ( 2 p)(p 
p/ \ p j \p j\p 



For most values of n, k, and p, this expression yields a very large num- 
ber; for example, for n = 40 and p = 10 (A; = 4), it is greater than 10 20 . 

Formally the problem could also be solved as an integer linear pro- 
gramming problem, with a large number of constraint equations neces- 
sary to express the uniformity of the partition. 

Because it seems likely that any direct approach to finding an optimal 
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solution will require an inordinate amount of computation, we turn to 
an examination of heuristics. Heuristic methods can produce good 
solutions (possibly even an optimal solution) quickly. Often in practical 
applications, several good solutions are of more value than one optimal 
one. 

The first and foremost consideration in developing heuristics for 
combinatorial problems of this type is finding a procedure that is power- 
ful and yet sufficiently fast to be practical. A process whose running 
time grows exponentially or factorially with the number of vertices of 
the graph is not likely to be practical. In most cases, a growth rate of 
more than the square of the number of vertices is still not too practical. 
(If the running time of a procedure grows as f(n), where n is the number 
of vertices involved, we shall refer to it as an f (n)-procedure.) 

1.3 False Starts 

To point out a few pitfalls, we mention some unsuccessful attempts at 
heuristic solutions to the partitioning problem. 

1.3.1 Random, Solutions 

One tactic is simply to generate random solutions, keeping the best 
seen to date, and terminating after some predetermined time or value 
is reached. This is quite fast, although actually an n"-procedure. Un- 
fortunately, this approach is unsatisfactory for problems of even 
moderate size, since there are generally few optimal or near-optimal 
solutions, which thus appear randomly with very low probabilities. 
Experience with 2-way partitions for a class of 0-1 matrices of size 
32 X 32, for example, has indicated that there are typically 3 to 5 
optimal partitions, out of a total of ^ (\l) partitions, giving a probability 
of success on any trial of less than 10 -7 . 

1.3.2 Max Flow-Min Cut 

Another partitioning method is the Ford and Fulkerson max flow-min 
cut algorithm 1 . The graph is treated as a network in which edge costs 
correspond to maximum flow capacities between pairs of nodes. A cut 
is a separation of the nodes into two disjoint subsets. The max flow-min 
cut theorem states that the maximal flow values between any pair of 
nodes is equal to the minimal cut capacity of all cuts which separate 
the two nodes. In our terminology, a cut is a 2-way partition, and the 
cut capacity is the cost of the partition. The Ford and Fulkerson algo- 
rithm finds a cut with maximal flow, which is thus a minimal cost cut; 
this represents a minimum cost partition of the graph into two subsets 
of unspecified sizes. 
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There are several difficulties involved in using the Ford and Fulkerson 
algorithm for our partitioning problem. The most severe of these is the 
fact that the algorithm has no provision for constraining the sizes of the 
resultant subsets, and there seems to be no obvious way to extend it 
to include this. Thus if flow methods are used to perform a split, then 
further processing is necessary to make the resulting subsets the correct 
size. If the subsets are greatly different in size, then use of this algorithm 
will have produced essentially no benefit. Hence in spite of its theoretical 
elegance, the Ford and Fulkerson algorithm is not suitable for this 
application. (Note however, that since it does find the minimal cost 
unconstrained 2-way partition, the value it produces is a lower bound 
for solutions produced by any method.) 

1.3.3 Clustering 

A class of much more intuitive methods is based on identifying "nat- 
ural clusters" in the given cost matrix — that is, groups of nodes which 
are strongly connected in some sense. For example, one can use very 
simple heuristics for building up clusters, based on collecting together 
elements corresponding to large values in the cost matrix. But again 
these methods do not in general include much provision for satisfying 
constraints on the sizes of the subsets, nor do they provide for syste- 
matic assignment of "stragglers" (nodes which do not obviously belong 
to any particular subset) . 

1.3.4 \-Opting 

Lin, working on the Traveling Salesman Problem, [See Ref. 2] cate- 
gorized a set of methods of improving given solutions by rearranging 
single links, double links, triplets, and in general, X links. He referred 
to a change involving the movement of X links as a X-change. If a con- 
figuration of the system is reached in which no X-change can be made 
which results in a decrease in cost, the configuration is said to be "X-opt." 

For the partitioning problem, an analogous operation is the inter- 
change of groups of X points between a pair of sets. Thus a 1-change is 
the exchange of a single point in one set with a single point in another 
set. A configuration is then said to be "1-opt" if there exists no inter- 
change of two points which decreases the cost of the partition. Experi- 
ments to evaluate 1-opting for 2-way partitions of 0-1 matrices (32 X 32) 
within which about one-half of the elements were nonzero, show that 
apparently optimal values can be achieved in about 10 percent of the 
trials; values within 1 or 2 of the optimal can be achieved in about 75 
percent of cases. 
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It appears fruitless to extend X beyond 1 (1-opting is already an 
n. 2 -procedure), or to extend 1-opting experiments to partitions into 
more than two subsets, since more powerful methods have been devel- 
oped. These methods are the topic of the next sections. 

II. TWO-WAY UNIFORM PARTITIONS 

2.1 Introduction 

The simplest partitioning problem which still contains all the sig- 
nificant features of larger problems is that of finding a minimal-cost 
partition of a given graph of 2n vertices (of equal size) into two subsets 
of n vertices each. The solution of the 2-way partitioning problem is the 
subject of this section. The solution provides the basis for solving more 
general partitioning problems. In Section 2.6, we discuss 2-way partitions 
into sets of unequal size. 

Let S be a set of 2n points, with an associated cost matrix C = (c,,), 
i j = l, • • • , In. We assume without loss of generality that C is a 
symmetric matrix, and that c,, = for all i. There is no assumption 
about nonnegativity of the c,,'s. We wish to partition S into two sets 
A and B, each with n points, such that the "external cost" T = ^T, A xb c ab 
is minimized. 

In essence, the method is this: starting with any arbitrary partition 
A, B of S, try to decrease the initial external cost T by a series of inter- 
changes of subsets of A and B; the subsets are chosen by an algorithm 
to be described. When no further improvement is possible, the resulting 
partition A', B' is locally minimum with respect to the algorithm. We 
shall indicate that the resulting partition has a fairly high probability 
of being a globally minimum partition. 

This process can then be repeated with the generation of another 
arbitrary starting partition A, B, and so on, to obtain as many locally 
minimum partitions as we desire. 

Given £ and (c,,), suppose A*, B* is a minimum cost 2-way partition. 
Let A, B be any arbitrary 2-way partition. Then clearly there are subsets 
X C A, Y C B with | X \ — \ Y | g£ w/2 such that interchanging X and 
Y produces A* and B* as shown below. 





B*=B-Y +X 
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The problem is to identify X and Y from A and B, without considering 
all possible choices. The process we describe finds X and Y approxi- 
mately, by sequentially identifying their elements. 
Let us define for each at A, an external cost E a by 

E a = £c a , 

mil 

and an internal cost I a by 

ztA 

Similarly, define E b , I b for each bzB. Let D t = E z — I, for all zt S; D, 
is the difference between external and internal costs. 

Lemma 1: Consider any at A, b t B. If a and b are interchanged, the 
gain (that is, the reduction in cost) is precisely D a + D b — 2c ab . 

Proof: Let z be the total cost due to all connections between A and 
B that do not involve a or b. Then 

T = z + E n + E b - c ab . 

Exchange a and b; let T' be the new cost. We obtain 

T' = z + I a + /, + c ab 

and so 

gain = old cost — new cost = T — T' 

= D a + D„ - 2c ab . 

2.2 Phase 1 Optimization Algorithm 

In this subsection we present the algorithm for 2-way partitioning. 
First, compute the D values for all elements of S. Second, choose 
a, f t A, b, t B such that 

<7i = D a . + D bl — 2c aibj 

is maximum; a,- and 6,- correspond to the largest possible gain from a 
single interchange. (We will return shortly to a discussion of how to 
select a, and 6, quickly.) Set a, and 6,- aside temporarily, and call them 
a[ and b[ , respectively. 

Third, recalculate the D values for the elements of A — {a,} and 
for£ - {&,}, by 

D' x = D x + 2c xai - 2c xbi xzA- |a.}, 

D f „ = D„ + 2c ubi - 2c uai ytB - {6,-}. 
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The correctness of these expressions is easily verified: the edge (x, a,) 
is counted as internal in D x , and it is to be external in D' x , so c xai must 
be added twice to make this correct. Similarly, c xb , must be subtracted 
twice to convert (.r, &,-) from external to internal. 

Now repeat the second step, choosing a pair a' 2 , b' 2 from A — \a\\ 
and B — \b[\ such that g 2 = £>„,• + D bt - — 2c a ,- bt . is maximum (a[ 
and b[ are not considered in this choice). Thus g 2 is the additional gain 
when the points a' 2 and b 2 are exchanged as well as a[ and b[ ; this ad- 
ditional gain is maximum, given the previous choices. Set a 2 and b! 2 
aside also. 

Continue until all nodes have been exhausted, identifying (a£ , &£),• • •, 
(a'„ , b' n ), and the corresponding maximum gains gr 3 , • • • , g„ . As each 
(a', 6') pair is identified, it is removed from contention for further choices 
so the size of the sets being considered decreases by 1 each time an 
(a', b') is selected. 

If X = a[ , a 2 , ■ ■ ■ , a' k , Y = b[ . b' 2 , • • • , b' k , then the decrease in 
cost when the sets X and Y are interchanged is precisely Qi + g 2 + 

+ f/ fc . Of course 2" f/- = °- ^ ote ^ iat some °^ tne 0'"' s are ne g ativG » 
unless all are zero. ^_ 

Choose k to maximize the partial sum S*-i (h = G. Now if G > 0, 
a reduction in cost of value (r can be made by interchanging X and F. 
After this is done, the resulting partition is treated as the initial partition, 
and the procedure is repeated from the first step. 

If G = 0, we have arrived at a locally optimum partition, which we 
shall call a phase 1 optimal partition. We now have the choice of re- 
peating with another starting partition, or of trying to improve the phase 
1 optimal partition. We shall discuss the latter option shortly. Figure 1 
is a flowchart for the phase 1 optimization procedure. 

2.3 Effectiveness of the Procedure 

One general approach to solving problems such as this one is to find 
the best exchange involving say X pairs of points, for some X specified 
in advance 2 . The difficulty encountered is that use of a small value of 
X is not sufficient to identify good exchanges, but the computational 
effort required grows rapidly as X increases. 

The procedure we have described sequentially finds an approximation 
to the best exchange of X pairs. X is not specified in advance, but rather 
is chosen to make the improvement as large as possible. This technique 
sacrifices a certain amount of power for a considerable gain in speed. 

Since we construct a sequence of gains r/, , i = 1, • ■ ■ , n, and find the 
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Fig. 1 — Flowchart of phase 1 optimization procedure. 



maximum partial sum, the process does not terminate immediately when 
some Qi is negative. This means that the process can sequentially identify 
sets for which the exchange of only a few elements would actually in- 
crease the cost, while the interchange of the entire sets produces a net 
gain. 

Numerous experiments have been performed to evaluate the pro- 
cedure on different types of cost matrices. The matrices used have in- 
cluded (i) 0-1 matrices, with density of nonzero elements ranging from 
5 percent to 50 percent, (ii) integer matrices with elements uniformly 
distributed on [0, k], k = 2, • • • , 10, (Hi) matrices with clusters of 
known sizes and binding strength. Results on all of these matrices have 
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been similar, so we shall only summarize them here. A more extended 
discussion may be found in Ref. 3. 

A useful measure of the power of a heuristic procedure is the proba- 
bility that it finds an optimal solution in a single trial. Suppose that p 
is the probability that a phase 1 optimal solution found using a random 
starting partition is globally optimal. We have examined the behavior 
of this probability as the size of the matrices involved is varied. Experi- 
ments show p is around 0.5 for matrices of size 30 X 30, 0.2 to 0.3 for 
60 X 60, and 0.05 to 0.1 for 120 X 120. The functional behavior of p 
is approximately p(n) = 2" n/3 °. 

These values are derived primarily from 0-1 matrices having about 
50 percent l's (randomly placed). Experiments on matrices with lower 
densities of l's yield larger variances, but substantially identical mean 
values for p. 

2.4 Running Time oj the Procedure 

Let us define a pass to be the operations involved in making one cycle 
of identification of {a[ , b[), • ■ • , (a' n , b'„), and selection of sets X and 
Y to be exchanged. The total time for a pass can be estimated this way. 
First, the computation of the D values initially is an ?i 2 -procedure, since 
for each element of S, all the other elements of S must be considered. 
The time required for updating the D values is proportional to the 
number of values to be updated, so the total updating time in one pass 
grows as 

(n - 1) + (n - 2) + • • • + 2 + 1 

which is proportional to n 2 . 

The dominant time factor is the selection of the next pair a' { , b' { to 
be exchanged. The method we have used to perform this searching is to 
sort the D values so that 

D ai ^ D a , ^ • • • ^ D an 

and 

D bl ^ D b , ^ • • • ^ D b . . 

When sorting is used, only a few likely contenders for a maximum gain 
need be considered. This is because when scanning down the set of Z)„'s 
and D b 's, if a pair D ai , D bi is found whose sum does not exceed the 
maximum improvement seen so far in this pass, then there cannot be 
another pair a k , b t with fc ^ i, I ^ ;', with a greater gain, (assuming 
Ca ^ 0) and so the scanning can be terminated. Thus the next pair 
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for interchange is found rapidly. Sorting is an n log n operation, so in 
this method, the total time required to sort D values in a pass will be 
approximately 

n log n + (n — 1) log (n — 1) + • • • + 2 log 2 

which grows as n 2 log n. 

To reduce the time for selection of an (a, b) pair, it is possible to use 
techniques which are faster than sorting, but which do not necessarily 
always give the maximum gain at each stage. For example, one method 
is to scan for the largest D a and the largest D b , and use the correspond- 
ing a and b as the next interchange. This method is essentially linear- 
time and would probably be implemented as part of the recomputation 
of the D values. It is best suited for sparse matrices, where the prob- 
ability that c ah > is small. A slight extension, involving negligible 
extra cost, is to save the largest two or three D a 's and D b 's, so that if 
the largest pair does not give the maximum gain (because c ah is too 
large), then another can be tried. Experience indicates that three values 
are sufficient in virtually all cases, even for matrices with a relatively 
high percentage of nonzero entries. Use of this method reduces running 
time by about 30 percent in the present implementation, with very 
small degradation of power. 

The number of passes required before a phase 1 optimal partition is 
achieved is small. On all matrix sizes tested at the time of writing (up 
to 360 points), it has been almost always from 2 to 4 passes. On the basis 
of this experimental evidence, the number of passes is not strongs- 
dependent on the value of n. 

From the foregoing observations, it is possible to estimate the total 
running time of the procedure. If we use a method which sorts the D 
values at each stage (time proportional to n 2 log ft), then the running 
time should grow as n 2 log n. If a fast-scan method is used, and the 
number of passes is constant, the running time should have an n 2 
growth rate; this is a lower bound. 

For comparison, examination of all pairs of sets X and Y, and evalua- 
tion of the costs would require time proportional to 

n/2 /\2 2 n /\2 

rf (2n 
~ 2 I 

2 \im 
for large n. This function grows as w 3/2 4". 
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Running times have been plotted in Fig. 2. The observed times have 
an apparent growth rate of about n 2 ' 4 , which is reasonably close to 
n 2 . Although on the logarithmic plot this curve is close to linear over 
the range n = 20 to n = 130, it may actually be n 2 log n; insufficient 
data is available to check this. All times are based on an implementation 
in FORTRAN G on an IBM System 360 Model 65. 

2.5 Improving the Phase 1 Optimal Partition 

In this section, we discuss a method which might be used to improve 
the partition produced by the phase 1 procedure, which may not be 
globally optimum. The method suggested in this section is based heavily 
on experimental evidence, although there are quite plausible reasons 
for performing the particular set of operations. The basic idea is to 
perturb the locally optimal solution in what we hope is an enlightened 
manner, so that an iteration of the process on the perturbed solution 
will yield a further reduction in the total cost. If this tactic fails, nothing 
has been lost except some computation time, since the best solution 
seen so far is always saved. 

Computer results for problems with up to 64 points suggest that 
whenever a phase 1 optimal solution is not globally optimal, | X \ = 
I Y | tt n/2. Roughly, this implies that if | X \ and | Y | had been small 
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Fig. 2 — Running time. 
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compared to n/2, they would have been found by the process; it is only 
larger sets which are not identified all the time. 

A successful heuristic to find the correct X and Y in this case is to 
find a phase 1 optimal partition for each of the sets A and B, say A — > 
[At , A 2 ] and B — > [2?, , B 2 ]. (That is, find near-optimal partitions of A 
and of B separately.) Recombine the 4 sets into 2, say 4 = ^i ^ ^ 
and i?o = A 2 \J B 2 , and continue with phase 1 optimization. If our 
expectation is correct, the new X and Y will be small, and thus readily 
identified by the phase 1 process. 

When A is split into A x , A 2 and B into Bi , B 2 there are two ways in 
which the smaller sets can be recombined. A series of tests was made on 
matrices of moderate size (up to 64 X 64) , in which both possible recom- 
binations were done, generating three phase 1 optimal values for each 
starting partition. For matrices of size 32 X 32, the apparent optimal 
value was observed at least once in each triple of values, for a large 
number of cases. With matrices of size 64 X 64, there were occasional 
failures. 

It might be noted that the extra time involved for the recombination 
approach is three times that required to do a completely new partition 
from a random start, assuming an rT-procedure. 

It is possible to estimate whether a particular improvement tactic 
is profitable or not in the following way. Suppose that some method 
increases the probability of finding an optimal partition from p to p' , 
while it increases the running time from t to t'. Then in a fixed amount 
of time, it is possible to do k trials of the basic procedure, and kt/t' 
trials of the improved method. The corresponding probabilities of 
achieving an optimal solution are 1 — (1 — p) k and 1 — (1 — p') kt/t 
respectively. The improved method is then desirable if the second ex- 
pression is greater than the first; by simple manipulation, this condition 
becomes 



1 - p' < (1 - p) 



t'/t 



On the basis of the numerical values in this section, it may be useful to 
try the recombination method. 

2.6 Partitioning into Unequal-Sized Sets 

It is simple to modify the procedure to partition a set S with n ele- 
ments into two sets of specified sizes n Y and n 2 (n, + n 2 = n). Assume 
n, < n 2 . Then restrict the maximum number of pairs that can be ex- 
changed in one pass of the procedure to n x . All other operations are 
performed on all elements of each set. (The starting partition is into 
two sets, of rii and n 2 elements respectively.) 
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Suppose we wish to partition S into two sets, such that there are at 
least n x elements and at most n a elements in each subset; n x + n 2 = n, 
but they are not specified further. 

The procedure is easily modified to handle this sort of constraint 
by the addition of "dummy" elements. These are elements which have 
no connections whatsoever; that is, they have zero entries in the cost 
matrix wherever they appear. Add 2n 2 — W dummies so S has 2n 2 
elements, and perform the procedure on it. The resulting partition will 
assign the dummy elements to the two subsets so as to minimize the 
external cost; at this point the dummies are discarded, leaving a parti- 
tion into two subsets that satisfy the size constraints given. 

2.7 Elements of Unequal Sizes 

We have made the assumption so far that the elements (vertices) of 
the graph are all of the same size. This requirement may be relaxed to 
a large extent by converting any node of size k > 1 to a cluster of k nodes 
of size 1, bound together by edges of appropriately high cost. The size 
of the problem will obviously increase proportionally to the value of k, 
so it may be necessary to sacrifice some accuracy to keep the number of 
generated nodes within reasonable bounds. 

III. MULTIPLE-WAY PARTITIONS 

3.1 Reduction to 2-Way Partitioning Problem 

So far, the discussion has been concerned exclusively with the basic 
problem of performing a 2- way partition on a set of In objects. In this 
section we extend the technique to perform /c-way partitions on a set of 
kn objects, using the 2-way procedure as a tool. 

The essential idea is to start with some partition into k sets of size 
n and by repeated application of the 2-way partitioning procedure to 
pairs of subsets, make the partition as close as possible to being pairwise 
optimal. (Section 3.2 treats the question of what starting sets to use.) 
Of course pairwise optimality is only a necessary condition for global 
optimality. There may be situations where some complex interchange 
of three or more items from three or more subsets is required to reduce 
a pairwise optimal solution to globally optimum; at the moment, no 
reasonable method for identifying such sets is known. 

There are ( 2 ) pairs of subsets to consider, so the time for one pass 
through all pairs is (assuming an n 2 -procedure) ( 2 )n 2 « (kn) 2 /2 = 
(number of points) 2 /2. In general, more passes than this will actually 
be required, since when two sets are made optimal, this may change their 
optimality with respect to other sets. 
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Experience indicates that the number of passes is small and the process 
converges quickly. For example, our algorithm selects (i, j) as the next 
pair of sets to be optimized, where either /' or j has been changed since 
the last time the pair (i, j) was selected. Using this selection process, 
the average number of passes through each pair of sets is a slowly grow- 
ing function of both k and n. For matrices of size 100 or less and k < 6, 
the number of passes has been less than 5. [The average number of 
passes is computed as the average number of pairs considered to reach 
pairwise optimality, normalized by (2).] 

In any particular trial, there is a correlation between the number of 
pairs selected and the quality of the final partition. To get a better 
solution requires more work. 

Convergence is rapid: two passes account for more than 95 percent 
of the improvement in most cases; the remaining passes contribute only 
small further reductions. Let p(n, k) be the proportion of minimum cost 
solutions found for a particular n and k. For k fixed and small compared 
to n, the functional behavior of p(n, k) is similar to the case k = 2, but 
the actual values are lower. Roughly, we observe p(n } k + 1) FH §p(n, k) 
for k in the range 2-4, and n up to 100, with considerable variation 
depending on the matrix being tested. For instance, for matrices of 
size about 40, p(40, 2) « 0.4, p(42, 3) « 0.2, and p(40, 4) « 0.1. 

Another interesting question is measurement of how close to optimum 
the partitions found are. The solutions obtained by pairwise optimiza- 
tion have values concentrated in a narrow range. In almost all cases, the 
largest value found by the procedure is within 4-5 percent of the smallest. 
As another measure, if c is the mean cost of random partitions and b is 
the cost of the best partition observed, then virtually all partitions 
found have values v such that 

v - b ^ 0.1(c - b). 

For instance, one test case was a series of 4-way partitions of a 0-1 
matrix of size 80. This matrix had 1278 nonzero entries (a density of 
0.2), corresponding to 639 edges in the graph. The mean value of 
randomly chosen partitions was 480.6. Twenty-four partitions of this 
matrix were found using the method described above. The lowest 
value encountered was 352 (1 time), the highest 365 (1 time); the mean 
value was 359.5, the median 360. 

3.2 Starting Partition 

In this subsection we discuss various methods of generating good 
starting partitions, based on modifications of the basic procedure. 
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The primary reason for choosing good starting partitions is that this 
particular form of preprocessing reduces the amount of work required 
to make the system pairwise optimal. It may also make the probability 
of an optimal solution higher, although this tendency is very difficult to 
evaluate. 

Several methods for finding good multi-way starting partitions which 
are based on repeated application of the procedure itself have been 
investigated. The essential idea is to generate a fc-way starting partition 
by first forming an r-way partition, then an s-way partition on each of 
the resulting subsets, and so on, up to /-way. (Here k = rs • • • I) The 
partitions found this way will in general be better than those which are 
completely arbitrary. A pairwise optimization stage is applied to the 
final set of subsets. 

For example, if k is a power of 2, then perform a 2-way split, then a 
2- way split on each of these subsets, and so on until the desired size of 
subsets is found. 

This general approach is prone to the following difficulty: the first 
split divides the original set into r subsets by trying to make the internal 
connections in each subset as large as possible. Obviously this may con- 
flict directly with the next stage, which is to try to divide each subset 
further. Carried to several levels, it can lead to a relatively poor overall 
solution. In experiments with 4-way partitions of matrices of sizes up to 
64 X 64, this method yields optimal solutions approximately as often 
as does starting with a 4-way partition in the first place. In addition, 
this method will be effective if the matrix happens to have natural 
clusters of approximately the correct size (that is, equal to the final 
subset size). 

A second method which can be used is to partition the set of kn ele- 
ments into a set of n and a set of (k — l)n, using the slightly modified 
version of the basic procedure discussed in the first part of Section 2.6. 
The set of n elements is set aside, and the next n elements from the 
remaining (k — l)n are identified. This continues until k subsets have 
been formed; again the pairwise optimization technique is used to 
improve on this partition. 

This method can make an error in the identification of the first set 
which will bias the choice of the second, and so on; the effect is most 
severe for the case where A - is large, so each set is small. 

The method of breaking off subsets sequentially has another potential 
flaw: regardless of the starting configuration, it will identify approxi- 
mately the same set each time it is used on a particular problem, and 
hence little is gained by using it twice on one cost matrix. However 
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variations in the order of performing pairwise optimizations can still 
produce different final partitions in general. 

Limited computational experience with sequential break-off followed 
by pairwise optimization suggests that it yields solutions which are on 
the average at least as good as (and sometimes slightly better than) 
those provided by pairwise optimization applied to an arbitrary /c-way 
starting partition. Pairwise optimization yields the optimum with a 
higher probability, however, because it is less susceptible to error caused 
by a bad choice made early. For instance, in tests on the 80 point matrix 
mentioned previously, sequential break off yielded 4-way solutions with 
a mean value of 358.6, but the lowest value found was 355. (The highest 
was 363.) These may be compared to 359.5, 352 and 365 for the standard 
partitioning method. 

Running time for the sequential break-off method is lower than for 
straight pairwise optimization. 

Insufficient data is available for a direct comparison between se- 
quential break off and the method of repeated subdivision. 

In all cases, the original process, be it a completely random generation 
of some initial configuration, or the production of a good starting parti- 
tion, is followed by a pairwise optimizing phase. It is unlikely that using 
better starting partitions will lead to worse results than random starts, 
on the average. Whether the possible improvement in results and running 
times will justify the extra computational effort required to generate the 
starting partition depends on the characteristics of the particular class 
of matrices being studied. 

Some limited experiments were performed to compare the present 
procedure with a multi-dimensional scaling technique 4 , on a Boolean 
matrix of 316 points, with about 1400 nonzero entries. The results in- 
dicated that the procedure identifies clusters well, even when no attempt 
is made to provide a good starting partition. 

3.3 Expansion Factor 

The introduction of dummy elements was mentioned in Section 2.6 
as a method of handling partitioning into subsets of unequal sizes. This 
can be viewed equally well as a means of introducing "slack" into a 
solution, in an attempt to get a lower overall cost by allowing "ex- 
pansion." That is, so far we have treated the problem of finding a parti- 
tion with a constraint on the sizes of the subsets, and on the number of 
subsets, since given kn points, we have tried to find the best partitions 
into exactly k subsets of n points each. Suppose we now relax this 
second constraint by permitting the addition of dummy elements to in- 
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Fig. 3 — Cost reduction by expansion. 

crease the size of the problem, and attempt to find the best solution in- 
volving any number (greater than or equal to k) of subsets, with at most 
n points in each. This solution with k or greater subsets will in general 
have a lower cost than the constrained solution. 

Figure 3 shows an example in which introducing slack permits a 
lower overall cost. Assume n is 3 and all nodes are size 1. The vertical 
edges have cost 1 and the horizontal ones cost 2. Any partition into 2 
equal subsets has a cost of at least 3, but there is an obvious partition 
into 3 subsets with cost 2. Any nontrivial partition into 4 or more sub- 
sets has a cost greater than 2, so 3 subsets represents the optimal ex- 
pansion. It is possible to find the minimal cost solution and the cor- 
responding optimal amount of expansion as follows. Suppose the problem 
has kn points to be partitioned into k sets of n points each. Starting with 
no slack (kn points), the optimal assignment is found. Then n dummies, 
enough to create one extra subset, are added, making a (/c -+- l)n prob- 
lem, and so on. Eventually, one subset is produced which consists 
entirely of dummies. When this occurs, we take the partition with this 
set of dummies removed as our optimum solution. 
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